-
Notifications
You must be signed in to change notification settings - Fork 112
[P2P] Native TCP support #593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
To build: To test: |
|
Nice 👍🏽 This is great. |
|
@YangZhou1997 TCP/EFA support will need integrations into |
|
So far, we keep the engine.h interface, so hopefully uccl_engine.cc will be compatible |
|
NCCL performance is actually extremely high: |
|
I have been playing with TCP support over the break, and now my take is that it is gonna be hard to beat NCCL. There are two reasons:
Because of this, I think we should support TCP in a similar way to TCPX by just layering on top of NCCL. Maybe that would need helps from @DanielDanyang @derekwin Also cc @praveingk @zhongjiechen @MaoZiming |
|
@YangZhou1997 Broadly I Agree.
|
|
@YangZhou1997 I'd like to try implementing TCP based on NCCL, but this time it should be directly under the engine, rather than uccl_engine like TCPX before, correct? |
|
Hi @DanielDanyang , great! Yes, let's try to organically integrate into engine.h/cc. You can create a new branch in uccl and open a PR from there. I would like to keep this pr for the record.
|
|
Hi @praveingk
My understanding is that for TCP, NCCL's solution is already optimal. And it is also pretty challenging to implement efficiently, thus I think we should just use NCCL for TCP transfer. |
|
Thanks @YangZhou1997 for the clarification. |
Description
Please include a summary of the changes and the related issue.
Fixes # (issue)
Type of Change
How Has This Been Tested?
Include any tests here.
Checklist
format.sh.build_and_install.shto verify compilation.